Zero-Shot Audio Classification Via Semantic Embeddings
نویسندگان
چکیده
In this paper, we study zero-shot learning in audio classification via semantic embeddings extracted from textual labels and sentence descriptions of sound classes. Our goal is to obtain a classifier that capable recognizing instances classes have no available training samples, but only side information. We employ bilinear compatibility framework learn an acoustic-semantic projection between intermediate-level representations classes, i.e., acoustic embeddings. use VGGish extract deep clips, pre-trained language models (Word2Vec, GloVe, BERT) generate either label or Audio performed by linear function measures how compatible embedding are. evaluate the proposed method on small balanced dataset ESC-50 large-scale unbalanced subset AudioSet. The experimental results show performance significantly improved involving are semantically close test training. Meanwhile, demonstrate both useful for learning. Classification concatenating label/sentence generated with different models. With their hybrid concatenations, further.
منابع مشابه
Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs
We consider the problem of zero-shot recognition: learning a visual classifier for a category with zero training examples, just using the word embedding of the category and its relationship to other categories, which visual data are provided. The key to dealing with the unfamiliar or novel category is to transfer knowledge obtained from familiar classes to describe the unfamiliar class. In this...
متن کاملZero-Shot Learning for Semantic Utterance Classification
We propose a novel zero-shot learning method for semantic utterance classification (SUC). It learns a classifier f : X → Y for problems where none of the semantic categories Y are present in the training set. The framework uncovers the link between categories and utterances through a semantic space. We show that this semantic space can be learned by deep neural networks trained on large amounts...
متن کاملProbabilistic Zero-shot Classification with Semantic Rankings
In this paper we propose a non-metric rankingbased representation of semantic similarity that allows natural aggregation of semantic information from multiple heterogeneous sources. We apply the ranking-based representation to zeroshot learning problems, and present deterministic and probabilistic zero-shot classifiers which can be built from pre-trained classifiers without retraining. We demon...
متن کاملZero-Shot Learning by Convex Combination of Semantic Embeddings
Several recent publications have proposed methods for mapping images into continuous semantic embedding spaces. In some cases the embedding space is trained jointly with the image transformation. In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage. Proponents...
متن کاملNeighborhood Sensitive Mapping for Zero-Shot Classification using Independently Learned Semantic Embeddings
In a traditional setting, classifiers are trained to approximate a target function f : X → Y where at least a sample for each y ∈ Y is presented to the training algorithm. In a zero-shot setting we have a subset of the labels Ŷ ⊂ Y for which we do not observe any corresponding training instance. Still, the function f that we train must be able to correctly assign labels also on Ŷ . In practice,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing
سال: 2021
ISSN: ['2329-9304', '2329-9290']
DOI: https://doi.org/10.1109/taslp.2021.3065234